Partial Monitoring - Classification, Regret Bounds, and Algorithms

نویسندگان

  • Gábor Bartók
  • Dean P. Foster
  • Dávid Pál
  • Alexander Rakhlin
  • Csaba Szepesvári
چکیده

In a partial monitoring game, the learner repeatedly chooses an action, the environment responds with an outcome, and then the learner suffers a loss and receives a feedback signal, both of which are fixed functions of the action and the outcome. The goal of the learner is to minimize his regret, which is the difference between his total cumulative loss and the total loss of the best fixed action in hindsight. In this paper we characterize the minimax regret of any partial monitoring game with finitely many actions and outcomes. It turns out that the minimax regret of any such game is either zero, Θ̃( √ T ), Θ(T ), or Θ(T ). We provide computationally efficient learning algorithms that achieve the minimax regret within logarithmic factor for any game. In addition to the bounds on the minimax regret, if we assume that the outcomes are generated in an i.i.d. fashion, we prove individual upper bounds on the expected regret.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Surrogate Regret Bounds for the Area Under the ROC Curve via Strongly Proper Losses

The area under the ROC curve (AUC) is a widely used performance measure in machine learning, and has been widely studied in recent years particularly in the context of bipartite ranking. A dominant theoretical and algorithmic framework for AUC optimization/bipartite ranking has been to reduce the problem to pairwise classification; in particular, it is well known that the AUC regret can be form...

متن کامل

Phased Exploration with Greedy Exploitation in Stochastic Combinatorial Partial Monitoring Games

Partial monitoring games are repeated games where the learner receives feedback that might be different from adversary’s move or even the reward gained by the learner. Recently, a general model of combinatorial partial monitoring (CPM) games was proposed [1], where the learner’s action space can be exponentially large and adversary samples its moves from a bounded, continuous space, according t...

متن کامل

Robust approachability and regret minimization in games with partial monitoring

Approachability has become a standard tool in analyzing learning algorithms in the adversarial online learning setup. We develop a variant of approachability for games where there is ambiguity in the obtained reward that belongs to a set, rather than being a single vector. Using this variant we tackle the problem of approachability in games with partial monitoring and develop simple and efficie...

متن کامل

Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning

We investigate contextual online learning with nonparametric (Lipschitz) comparison classes under different assumptions on losses and feedback information. For full information feedback and Lipschitz losses, we design the first explicit algorithm achieving the minimax regret rate (up to log factors). In a partial feedback model motivated by second-price auctions, we obtain algorithms for Lipsch...

متن کامل

No-Regret Algorithms for Unconstrained Online Convex Optimization

Some of the most compelling applications of online convex optimization, including online prediction and classification, are unconstrained: the natural feasible set is R. Existing algorithms fail to achieve sub-linear regret in this setting unless constraints on the comparator point x̊ are known in advance. We present algorithms that, without such prior knowledge, offer near-optimal regret bounds...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Math. Oper. Res.

دوره 39  شماره 

صفحات  -

تاریخ انتشار 2014